Chow Test
   HOME

TheInfoList



OR:

The Chow test (), proposed by econometrician
Gregory Chow Gregory Chi-Chong Chow (; born December 25, 1930) is a Chinese-American economist at Princeton University and Xiamen University. The Chow test, commonly used in econometrics to test for structural breaks, was invented by him. He has also been in ...
in 1960, is a test of whether the true coefficients in two
linear regression In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is cal ...
s on different data sets are equal. In econometrics, it is most commonly used in time series analysis to test for the presence of a
structural break In econometrics and statistics, a structural break is an unexpected change over time in the parameters of regression models, which can lead to huge forecasting errors and unreliability of the model in general. This issue was popularised by Da ...
at a period which can be assumed to be known ''a priori'' (for instance, a major historical event such as a war). In program evaluation, the Chow test is often used to determine whether the independent variables have different impacts on different subgroups of the population.


Illustrations


First Chow Test

Suppose that we model our data as : y_t=a+bx_ + cx_ + \varepsilon.\, If we split our data into two groups, then we have : y_t=a_1+b_1x_ + c_1x_ + \varepsilon \, and : y_t=a_2+b_2x_ + c_2x_ + \varepsilon. \, The
null hypothesis In scientific research, the null hypothesis (often denoted ''H''0) is the claim that no difference or relationship exists between two sets of data or variables being analyzed. The null hypothesis is that any experimentally observed difference is ...
of the Chow test asserts that a_1=a_2, b_1=b_2, and c_1=c_2, and there is the assumption that the model errors \varepsilon are
independent and identically distributed In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is usual ...
from a
normal distribution In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu ...
with unknown
variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbe ...
. Let S_C be the sum of squared residuals from the combined data, S_1 be the sum of squared residuals from the first group, and S_2 be the sum of squared residuals from the second group. N_1 and N_2 are the number of observations in each group and k is the total number of parameters (in this case 3, i.e. 2 independent variables coefficients + intercept). Then the Chow test statistic is : \frac. The test statistic follows the ''F''-distribution with k and N_1+N_2-2k
degrees of freedom Degrees of freedom (often abbreviated df or DOF) refers to the number of independent variables or parameters of a thermodynamic system. In various scientific fields, the word "freedom" is used to describe the limits to which physical movement or ...
. The same result can be achieved via dummy variables. Consider the two data sets which are being compared. Firstly there is the 'primary' data set i= and the 'secondary' data set i=. Then there is the union of these two sets: i=. If there is no structural change between the primary and secondary data sets a regression can be run over the union without the issue of biased estimators arising. Consider the regression: y_t=\beta_0+\beta_1x_ + \beta_2x_ + ... + \beta_kx_ + \gamma_0D_t + \sum_^k\gamma_ix_D_t + \varepsilon_t.\, Which is run over i=. D is a dummy variable taking a value of 1 for i= and 0 otherwise. If both data sets can be explained fully by (\beta_0,\beta_1,...,\beta_k) then there is no use in the dummy variable as the data set is explained fully by the restricted equation. That is, under the assumption of no structural change we have a null and alternative hypothesis of: H_0: \gamma_0=0,\gamma_1=0,...,\gamma_k=0 H_1: \text The null hypothesis of joint insignificance of D can be run as an F-test with n-2(k+1) degrees of freedom (DoF). That is: F=\frac . Remarks * The global sum of squares (SSE) is often called the Restricted Sum of Squares (RSSM) as we basically test a constrained model where we have 2k assumptions (with k the number of regressors). * Some software like SAS will use a predictive Chow test when the size of a subsample is less than the number of regressors.


References

* * * * *


External links

{{commonscat, Chow test
Computing the Chow statistic
Series of FAQ explanations from the
Stata Stata (, , alternatively , occasionally stylized as STATA) is a general-purpose statistical software package developed by StataCorp for data manipulation, visualization, statistics, and automated reporting. It is used by researchers in many fie ...
Corporation at https://www.stata.com/support/faqs/

Series of FAQ explanations from the SAS Institute, SAS Corporation Time series statistical tests Regression diagnostics